Skip to content

Update maxToolCalls and minToolCalls in eval.yaml#65

Merged
janisz merged 2 commits intomainfrom
Update-maxToolCalls-and-minToolCalls-in-eval.yaml
Mar 18, 2026
Merged

Update maxToolCalls and minToolCalls in eval.yaml#65
janisz merged 2 commits intomainfrom
Update-maxToolCalls-and-minToolCalls-in-eval.yaml

Conversation

@janisz
Copy link
Copy Markdown
Contributor

@janisz janisz commented Mar 17, 2026

Description

Adjust tool calls to match gpt micro results

Validation

CI

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.49%. Comparing base (18d456a) to head (0c1322a).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #65   +/-   ##
=======================================
  Coverage   78.49%   78.49%           
=======================================
  Files          28       28           
  Lines        1223     1223           
=======================================
  Hits          960      960           
  Misses        223      223           
  Partials       40       40           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 17, 2026

E2E Test Results

Commit: 0c1322a
Workflow Run: View Details

=== Evaluation Summary ===

  ✓ list-clusters (assertions: 3/3)
  ✓ cve-detected-workloads (assertions: 3/3)
  ✓ cve-detected-clusters (assertions: 3/3)
  ✓ cve-nonexistent (assertions: 3/3)
  ✓ cve-cluster-does-exist (assertions: 3/3)
  ~ cve-cluster-does-not-exist (assertions: 2/3)
      - ToolsUsed: Required tool not called: server=stackrox-mcp, tool=, pattern=list_clusters
  ✓ cve-clusters-general (assertions: 3/3)
  ✓ cve-cluster-list (assertions: 3/3)
  ✓ cve-log4shell (assertions: 3/3)
  ✓ cve-multiple (assertions: 3/3)
  ✓ rhsa-not-supported (assertions: 2/2)

Tasks:      11/11 passed (100.00%)
Assertions: 31/32 passed (96.88%)
Agent used tokens:
  Input:  30644 tokens
  Output: 21689 tokens
Judge used tokens:
  Input:  9713 tokens
  Output: 13173 tokens

Comment thread e2e-tests/mcpchecker/eval.yaml Outdated
Co-authored-by: Tomasz Janiszewski <janiszt@gmail.com>
@janisz janisz requested a review from mtodor March 17, 2026 12:56
@janisz janisz merged commit 872f3fc into main Mar 18, 2026
6 checks passed
@janisz janisz deleted the Update-maxToolCalls-and-minToolCalls-in-eval.yaml branch March 18, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants